Nonlinear Emotional Prosody Generation and Annotation1
نویسندگان
چکیده
Emotion is an important element in expressive speech synthesis. The paper makes the brief analysis on prosody parameters, stresses, rhythms and paralinguistic information in different emotional speech, and labels the speech with rich annotation information in multi-layers. Then, a CART model is used to do the emotional prosody generation. Unlike the traditional linear modification method, which makes direct modification of F0 contours and syllabic durations from acoustic distributions of emotional speech, such as, F0 topline, F0 baseline, durations and intensities, the CART models try to map the subtle prosody distributions between neutral and emotional speech within various context information. Experiments show that, with the CART model, the traditional context information is able to generate a good emotional prosody outputs, however the results could be improved if more rich information, such as stresses, breaks and jitter information, are integrated into the context information.
منابع مشابه
Nonlinear Emotional Prosody Generation and Emotional Tags
The paper analyzes the prosody features, which includes the intonation, speaking rate, intensity, based on classified emotional speech. As an important feature of voice quality, voice source are also deduced for analysis. With the analysis results above, the paper creates both a CART model and a weight decay neural network model to find acoustic importance towards the emotional speech classific...
متن کاملAffective and sensorimotor components of emotional prosody generation.
Although advances have been made regarding how the brain perceives emotional prosody, the neural bases involved in the generation of affective prosody remain unclear and debated. Two models have been forged on the basis of clinical observations: a first model proposes that the right hemisphere sustains production and comprehension of emotional prosody, while a second model proposes that emotion...
متن کاملEmotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform
An artificial neural network is one of the most important models for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as mel cepstral coefficients (MCC) which represent the spectrum features. However, a simple representation for fundamental frequency (F0) is not enough for neural networks to deal with an...
متن کاملNeural Processing of Emotional Prosody across the Adult Lifespan
Emotion recognition deficits emerge with the increasing age, in particular, a decline in the identification of sadness. However, little is known about the age-related changes of emotion processing in sensory, affective, and executive brain areas. This functional magnetic resonance imaging (fMRI) study investigated neural correlates of auditory processing of prosody across adult lifespan. Unatte...
متن کاملEmotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional Data
Deep learning techniques have been successfully applied to speech processing. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as mel cepstral coefficients (MCC), which represent the spectrum features in voice conversion (VC) tasks. Despite these successes, the approach is restricted to problems with moderate dimension and sufficient data. Thus, in emot...
متن کامل